Research questions
In general, a research question’s nature depends on the goal and type of the pursued analysis (See fig 3.9 in Francom 2024):
| Type | Aims | Approach | Methods | Evaluation |
|---|---|---|---|---|
| Exploratory | Explore: gain insight | Inductive, data-driven, and iterative | Descriptive, pattern detection with machine learning (unsupervised) | Associative |
| Predictive | Predict: validate associations | Semi-deductive, data-/ theory-driven, and iterative | Predictive modeling with machine learning (supervised) | Model performance, feature importance, and associative |
| Inferential | Explain: test hypotheses | Deductive, theory-driven, and non-iterative | Hypothesis testing with statistical tests | Causal |
EXPLORATORY Research questions
Question 1.1
Is there a pattern in the WBG project document corpus1 that shows non random variation in the incidence of certain words, phrases, or policy concepts2 over time?
Hypothesis
The hypothesis being tested here is that the WBG project document corpus shows a non-random variation in the incidence of certain policy concepts over time.
The launch of a “policy slogan” carries intrinsic motivations to shift the PDO in a certain direction.
- This question will be handled in a data-driven way, i.e. starting from patterns observed in the text data and not from predetermined ideas.
Question 1.2
Could the WDR3 publications “explain” or at least have a correlation to the recurrence over time of said concepts?
Hypothesis
The “alternative” hypothesis being tested here is that the WDR has a “traction effect” on the PDO of the following FYs.
Question 1.3
Since the WBG project document corpus data are very incomplete when it comes to sector and theme tagging: is it possible to overcome the insufficient data completion using TOPIC MODELING?
Hypothesis
The hypothesis being tested here is that some ML techniques can help improving the quality of the “document data collection”, e.g. the poor and incomplete sector/theme tagging of the WBG project documents.
- Note that for this purpose the available dataset (~ 20 fiscal years worth of project PDOs descriptions) has been splitted into a training + validation + test sets.
For the moment, the study’s aim is mainly to EXPLORE (e.g., trends over time in phrases occurrence), and possibly to PREDICT (e.g., use ML to enhance the quality of metadata variables). Possible follow-up, also depending on the results of the previous exploratory questions.
EXPLANATORY Research questions
…
PREDICTIVE Research questions
…
References
Footnotes
WBG project document observed in this case are Project Development Objectives (PDO) descriptive short texts.↩︎
Concepts encompasse “policy focus”, “sector”, “strategy” or “emerging priority” in the arena of funding for development ….↩︎
WDRs (World Development Reports) are the flagship reports of the World Bank group that have been published annually since 1978.↩︎